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Constant Compositions in the Sphere Packing 
Bound for Classical-Quantum Channels 

Marco Dalai, Member, IEEE, Andreas Winter 


Abstract 

The sphere packing hound, in the form given hy Shannon, Gallager and Berlekamp, was recently extended to 
classical-quantum channels, and it was shown that this creates a natural setting for combining prohahilistic approaches 
with some combinatorial ones such as the Lovasz theta function. In this paper, we extend the study to the case of 
constant composition codes. We first extend the sphere packing bound for classical-quantum channels to this case, and 
we then show that the obtained result is related to a variation of the Lovasz theta function studied by Marton. We then 
propose a further extension to the case of varying channels and codewords with a constant conditional composition 
given a particular sequence. This extension is then applied to auxiliary channels to deduce a bound which can be 
interpreted as an extension of the Elias bound. 


I. Introduction 

The sphere packing bound has been recently extended to classical-quantum channels 12, d Sec. V] by resorting 
to the first rigorous proof given for the case of classical discrete memoryless channels (DMC) by Shannon, Gallager 
and Berlekamp i). That resulted in an upper bound to the reliability function of classical-quantum channels, which 
is the error exponent achievable by means of optimal codes. 

The classical proof given in d can be considered a rigorous completion of Fano’s first efforts toward proving 
the bound d Ch. 9]. However, while Fano’s approach led to a tight exponent at high rates for general constant 
composition codes, the proof in 01 only considers the case of the optimal composition. Shortly afterwards, 
Haroutunian 0, Q, proposed a simple yet rigorous proof which gives the tight exponent for codes with general 
(possibly non optimal) constant composition. However, a greedy extension of this proof to classical-quantum 
channels does not give a good bound (see d Th. 11.20 and page 35]). This motivated the choice made in 0, 
a to follow the approach of 0. 

In this paper, we modify slightly the approach in 0, 0 to derive a sphere packing bound for classical-quantum 
channels with constant composition codes. The main difference with respect to the classical case is in the resulting 
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possible analytical expressions of the bound, which does not seem to be expressible, in this case, in terms of the 
Kullback-Leibler divegence and mutual information. In analogy with the results obtained in |[9l El Sec. VI], we 
then discuss the connections of the constant composition version of the bound with a quantity introduced by Marton 
Col as a generalization of the Lovasz theta function for bounding the highest rate achievable by zero-error codes 
with codewords of a given arbitrary composition. Finally, we propose an extension of the sphere packing bound 
for varying channels and codewords with a constant conditional composition from a given sequence, and we show 
that this result includes as a special case a recently developed generalization of the Elias bound IfTTI . 

II. Definitions 

Consider a classical-quantum channel € with input alphabet X = {1,... ,\X\} and associated density operators 
Sx, xGX, in a finite dimensional Hilbert space T-L. The n-fold product channel acts in the tensor product space 
of n copies of TL. To a sequence x = (xi,X 2 ,- ■ ■ ,a;„) we associate the signal state Sx; = Sxi^Sx 2 ■ "^Sx^- 
A block code with M codewords is a mapping from a set of M messages {1,... ,M} into a set of M codewords 
Xi,...,xm and the rate of the code is R={\ogM)/n. 

We consider a quantum decision scheme for such a code (POVM) composed of a collection of M positive 
operators {Hi, 02,..., Hm} such that where 1 is the identity operator. The probability that message 

m' is decoded when message m is transmitted is P^/|^ = TrEm'-S'^c^ the probability of error after sending 
message m is 

Pe|m = l-Tr(nm5'3;„,). 

The maximum error probability of the code is dehned as the largest Pe\m, that is, 

Pe.max — mUX Pe I m ■ 

771 

In this paper, we are interested in bounding the probability of error for constant composition codes. Given a 
composition P„, we dehne Pe^ax(f?, fn) to be the smallest maximum error probability among all codes of length 
n, rate at least R, and composition P„. For a probability distribution P, we define the asymptotic optimal error 
exponent with composition P as 

P) = lim sup - - log (P„, P„), (1) 

n—>oo ^ 

where the limsup is over all sequences of codes with rates P„ and compositions P„ such that Rn^R and Pn^P 
as n^oo. For channels with a zero-error capacity, the function E{R,P) can be inhnite for rates R smaller than 
some given quantity Co{P), which we can call the zero-error capacity of the channel relative to P. It is important 
to observe that, as for Cq, the value C'o(P) only depends on the confusability graph G of the channel, for which 
we could also call it C{G,P) ifT^ . lfTO]| . 

To avoid unnecessary complications, we use a flexible notation in this paper. We keep it simple as far as possible, 
progressively increasing its complexity by adding arguments to functions as their dehnitions become more general. 
The meaning of all quantities will be clear from the context. 
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III. Sphere Packing Bound eor Constant Composition Codes 
Our main result is the following theorem. 

Theorem 1: For all positive rates R, distribution P, and positive £<R, we have the bound 


e{r,p)<e:i{r-£,p) 


where E^^{R,P) is defined by the relations 


EIRR,P)=sup[E^o^ip,P)-pR]., 


p>0 


EQ{p,P)=mm 

r 


-{l + p)Y,P{x) log TT{Si+^ F^) 


( 2 ) 

( 3 ) 


the minimum being over all density operators F. 

Proof: See Appendix lAl ■ 

The bound is written here in terms of Renyi divergences. For commuting states, that is, classical channels, the 
bound can be written in the more usual form in terms of Kullback-Leibler divergences and mutual information as 
in El. In fact, assuming that the states Sx commute, let for notational convenience W{y\x) be their eigenvalues, 
which we interpret as classical probability distributions, indexing in y the output space. Then we can write (see Cl 
Ch. 5, Prob. 23]) 


EoiP’P) = ^^^ 

r 


= inin 
Q 


-(1 + p) ^ P(a;) logTr(^,^+'’ F^p) 

X 

- (1+p) X! logX! ^ Q(2/) ^) 




log 


y 

Viy\x) 


W{y\x) 

= min [D(C| |1F|P) + SI{P, V)], 


+ 5 log 


yjylx) 

Q{y) 


( 4 ) 

( 5 ) 

(6) 
( 7 ) 


where the V{-\x) and Q run over probability distributions on y, I{P,V) is the mutual information with the notation 

of Q 

( 8 ) 


I{P,V)=Y^P{x)V{y\x)\og 

^ Y.x'PyPWhjW) 


x,y 


and P(C||PF|P) is the conditional information divergence 

D{V\\W\P) = Y,P{x)Y,V{y\x)\og^^^. 
Hence, for classical channels, we have the more familiar form of the bound (see 171 ) 

P“ (P, P) = sup [min (P (C11 IF IP) + (5/(P, C)) - pP 


p>0 


V 


= min P(F||1F|P). 
V:I(P,V)<R 


( 9 ) 

( 10 ) 

( 11 ) 
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This form of the bound emerges naturally in Haroutunian’s proof ii, Q, which is very simple and gives a very 
intuitive interpretation of the resulting expression. For a given rate R, one considers auxiliary channels V such 
that I{P,V) <R. Given codes with rate R and composition P, by the strong converse to the coding theorem, the 
probability of error over channel V for at least one codeword is nearly one. For that same codeword, the probability 
of error over channel W can be lower bounded in terms of the Kullback-Leibler divergence D{V\\W\P), and this 
leads to the sphere packing bound. 

It is interesting to consider what happens in the case of non-commuting states. A reasoning similar to the one 
described in the last paragraph can be applied to derive a bound which is the formal analog of the classical one in 
the form given using equation (fTTI) . namely (see H] Th. 11.20]) 


E{R,P)< min D{V\\S\P) (12) 

V-.I(P,V)<R 

where now the minimum is over all set of density operators 14, 

/(P,y) = iF (y]P(a;)Kj-y]P(a;)Fr(K), with P(p) = -Trplogp, (13) 

and 

P(y||5|P) = y]P(x)Try,(logK-log5,). (14) 

X 

The main difference with respect to the classical case, however, is that this bound does not have good properties 
in the more general classical-quantum setting. For example, note that - as in the classical case - the bound is finite 
only when the 14 can be chosen so that supp(14)Csupp(5'a;). As a consequence, for pure-state channels the bound 
is infinite for rates R< I{P, S), which means that the bound is essentially trivial in this case. The reason for this 
unexpected behavior can be traced back to a fundamental difference in the study of error exponents in the classical 
and quantum binary hypothesis testing (see for example ifTSl Sec. 4.8]). A more detailed discussion of this issue 
requires an inspection of the proof of the sphere packing bound and is thus deferred to Appendix ICl 

Now it is not difficult to show that after optimization of the composition we recover the original bound of O, 
El. In order to do this, note that 


Then, 



= max mm 
p F 


= mm max 
F p 


maxF;5"(p,P) 

- (1 + p) y] P(a;) log Tr(4j^ ) 

X 

- (1 + p) y] P(x) log Tr(4j^ PTTt;) 

X 

— (1 + p) maxlog Tr(5'x'^^ ) 


= mm 
F 
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where the minimum and the maximum can be exchanged due to linearity in P and convexity in F. The resulting 
expression is in fact the coefficient Eo{p) which defines the sphere packing bound as proved in ||3] Th. 6]. Hence, 
this procedure allows us to recover the results of 121, El by noticing that 

E{R) = sup E{R,P) (15) 

p 

<supEll{R-e,P) (16) 

p 

= E,^{R-e). (17) 

Theorem [T] constitutes thus the most general form of the sphere packing bound, from which all other forms can be 
derived. 


IV. Connections with Marton’s function 


The bound E^^{R,P) obtained in the previous section can be used as an upper bound for the zero-error capacity 
of the channel relative to P. Whenever the function E^p{R — e,P) is finite, in fact, then the probability of error at 
rate R is non-zero. It is not difficult to observe that the smallest rate Roq{P) at which E^p{R,P) is finite can be 
evaluated as 


Roo{P)= lim 

p —¥00 


K{p,P) 

p 


= min 
F 


-Y,P{x)\ogPT{SlF) 


where S'® is the projection onto the range of Sx- When optimized over P, we obtain the expression 


S^=nunmaxlog^^, 

already discussed in El- Hence, we have the bounds Co{P)<Roo(P) and Co<Roo- 

It was observed in fJl and El Sec. VI] that Roo is related to the Lovasz number -d M- Here, we observe that, 
in complete analogy, the value i?oo(T’) is related to a variation of the function introduced by Marton in ifTOll as 
an upper bound to C{G,P). Given a (confusability) graph G, Marton introduces the following quantitjQ: 


^(G'.-P)= min ^^(Tjlog | ( 18 ) 

1 “-}./^ \{Ux\fW 

where the minimum is over all representations {ux} of the graph G in the Lovasz sense and over all unit norm 
vectors /. She then shows that G(G,P)<t?(G,P). 

Let us now compare this bound with the best bound on C{G,P) that we can deduce from the sphere packing 
bound using Roo{P)- We enforce the notation writing i?oo({5'a;},T’) to point out the dependence of Roa{P) on the 


*We use the notation i?(G, P) in place of Marton’s \{G,P) to preserve a higher coherence with the context of this paper. For the same 
reason, in what follows we also use, as in a, a logarithmic version of the ordinary Lovasz function, that is, our con'esponds to log'll in 
Lovasz’ notation. 
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channel states Sx- For a given confusability graph G, the best upper bound to C{G,P) is obtained by minimizing 
Roo{{Sx},P) over all possible channels with confusability graph G. We may then debne 


Ap{G,P)= ini R^{{Sx},P) 

{Sx} 

(19) 

= inf P(x)log——r, 

^ ^ ^Tr(P,P)’ 

(20) 


where {Ux} now runs over all sets of projectors with confusability graph G. Then we have the bound G{G,P)< 

The quantity dsp{G,P) is the constant composition analog of the formal quantity i9sp(G) dehned in |[3 Sec. VI]. 
In that case it was observed by Schrijver and by Duan and Winter lITSl that in fact ='(?(G) (with our 

logarithmic dehnition of d, see footnote [U. We have the analogous result for constant compositions. 

Theorem 2: For any graph G and composition P, 'dsp{G,P) = 'd{G,P). 

Proof: It is obvious that 'dsp{G,P)<d{G,P), since the right hand side of (fTsT l is obtained by restricting the 
operators in the right hand side of (l20l i to have rank one. 

We now prove the converse inequality (cf. mi Let {Ux} and F be a representation of G and a state respectively. 
Let hrst I')/') G ® "H' be a purihcation of F obtained using an auxiliary space TL', so that Tr(t/a;F) = Tr((72, 0 
Let then 


\Wx) = 


Ux'S)t'ui 


( 21 ) 


\\Ux(^tH'm\ 

It is not difficult to check that {ui^} is an orthonormal representation of G and that Tt{UxF)=Tt{Ux^'^'H' I'0)(V'I) = 
|(u;a;|'0)P, for all x. Hence, the orthormal representation {rua;} and the unit norm vector ip satisfy 


which implies that i?(G,P)<i?sp(G',F’)- ■ 

We can now discuss another interesting issue about the use of the quantity 'd{G,P). When we are interested in 
bounding Go, we can use the bound Go<i?(G) or we can also use the bouncj^ Go <maxpz?(G, P). Marton ifTOl 
states that this does not make a difference since - “as is easily seen” - maxp'd{G,P) = '0{G). However, a proof of 
this statement does not seem to follow easily from the dehnitions. It can in fact be written as 


max mm 
P MJ 


in y^P(x)log-;—7-rr75-= min maxlog- 

l ^ ' \ UlJ f\ 2 r„ \ f X 


\{Ux\f)? {u.}J - "|^.|/)P 


= mm max 
{Ux}J P 


(23) 

(24) 


IKI/)P 

and, in order to prove the equality, we would need to exchange the maximization over P with the minimization 
over representations and handles. It is not clear in Marton’s paper what argument she used to motivate it. We use 
Theorem |2] to prove this statement. 

Theorem 3: For any graph G, maxpi?(G,P) =i?(G). 


^Note that Cq = maxp Co(^’). since the number of compositions is polynomial in the block-length. 
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Proof: For any representation {Ux} of G and density operator F, define the function f{x)=TrUxF, and denote 
the set of all functions / obtained in this way by OR(G'). The proof of Theorem |2] shows that any /gOR(G') can 
be realized by rank-one projections Ux = \ux){ux\ and a pure state F = \f){f\, in a space of dimension at most \X\ 
(namely the span of the |ua;))- In particular, it follows that OR(G') is closed and compact. 

Furthermore, it is convex: namely, consider fi{x) =TtUx '^for representations {Ux'^} of G and density 
operators F^’'\ i = l,2. Then, for 0<p<l, let Ux = Ux^'^ and F=pF^^'^ (B{l—p)F^'^\ which has associated 

f{x)=TiUxF=pfi{x) + {l-p)f 2 {x), i.e. pfi + (1 -p)/ 2 eOR(G). 

Now dehne the quantity 

(25) 

for compositions P and functions /£OR(G). The theorem is equivalent to the statement that 


max min J(f,P)— min maxJif, P), (26) 

P /eOR(G) ' ' /eOR(G) P ^ ^ 

since the left hand side equals maxpi?(G,P) by Theorem|3 and the right hand side equals '(?(G) by |I3 Th. 8]. 

But ( |26] | is an instance of the minimax theorem. Indeed, both the domains of / and P are convex and compact, 

and the functional J is convex in the former and concave (in fact affine linear) in the latter. ■ 

We close this section with a simple yet useful result which we will need in the next section. This is the analogous 

of m Th. 10] for the constant composition setting. 

Theorem 4: For any pure-state channel we have the inequality El^{Roo{P),P)<Roo{P)- 

1 

Proof: For a pure state channel, since Sx*’’ =Sx = Sx, we have 


Eo{p,P) = inm 

r 


= min 
F 


< min 
F 


- (1 + p) ^ P(x) log Tt{SF'’ F^ ) 

X 

-{l + p)^P{x)\ogTT{SxF^) 

X 

-(l + p)^P(a:)logTr(5SP) 


— P)Roo{P)^ 


from which we easily deduce the statement by dehnition of E^^{R,P). 


V. Conditional Compositions 

A. Conditional Sphere Packing Bound 

We now develop an extension of the sphere packing to handle the case of varying channels with a conditional 
composition constraint on the codewords. Although this setting can appear artihcial, the bound will prove useful 
when applied to auxiliary channels in a procedure that can be considered as an evolution of the method used 
in E Sec. VIII] along the same lines taken in im. Here we assume that we have a hnite set A of possible 
states and a different channel Ca, for each state aGA. The communication is governed by a sequence of states 
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a = {ai,...,an)&A'^ (known to both encoder and decoder) with composition P„, which determines the channels to 
use. In particular, channel is used at time instant i. The composition constraint in this case is that all codewords 
have conditional composition Vn given a, which means that any codeword has a symbol a; in a fraction Vn{x\a) 
of the nPn{a) positions where ai = a. We then assume that, as n—>- 00 , Pn^P and Vn^V . 

Remark 5: Note that this general scenario includes the ordinary constant composition situation considered before, 
which is obtained for example when P{a) = l for some a and a—{a,a,...,a). Note that it also includes the study 
of the parallel use of iT>l channels, which can be recovered by setting P{a) = l/K,\/a, and normalizing the block 
lengths by a factor K. 

For a given P and V, let now E{{€a},R,V\P) be the optimal asymptotic error exponent achievable by codes 
with asymptotic conditional composition V with respect to a sequence with asymptotic composition P using the 
set of channels {€a}, aGA. Then we have the following result. 

Theorem 6: We have the inequality 


E{{ta},R,V\P)<ElA{iLa},R-e,V\P), 


(27) 


where E^A{€a},R,V\P) is defined by 


EtA{€a},R,V\P) = snp[E^o^{{€a},P,V\P)-pR], 


p>0 


PS‘=({€4,p,y|P) = ^P(a)£;§=(€,,p,F(-|a)), 


(28) 

(29) 


and P“(€a,p,F(-|a)) '^he coefficient P“ of the sphere packing bound for channel €a with composition V{-\a), 
as defined in Q. 

Proof: See Appendix iBl ■ 

We observe that the function P54{*^a},7?, 17|P) is finite for all rates R>Roo{{^a},V\P) where 


p^oo p 


= lim Pic 


PS^(€,,p,y(-|a)) 


(30) 

( 31 ) 

(32) 


p^oo-^^ p 

a 

= ^P(a)Poo(Ca,n-|a))- 

a 

Furthermore, it is not difficult to show, using the same procedure used in Theorem [H that for pure-state channels 
we have the inequality 


E:A{€a},Roo{{€a},V\P),V\P)<R^{{€a},V\P). 


(33) 


B. Improvement of the Sphere-Packed Umbrella Bound 

We can now combine the bound derived above with the ideas presented in fTh), ||3] and ifTTl . much in the same 
way as done in m m, to obtain a bound on the reliability of a channel € using auxiliary classical-quantum 
channels {€a}. We limit here the discussion to the case of a pure-state channel with states Sx = \'f’x){'f’x\ and 
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pure-states auxiliary channels {€a}. The general case will become clear in the next section where we reformulate 
this bound in terms of code distances, reinterpreting it as a generalization of the Elias bound. 

For a p>l, we define the set r(p) of admissible pure-state auxiliary channels € with states Sx = \'4>x){'4tx\ such 
that 

\{'iijx\'ipx')\<\{ipx\ipx')\^^^, yx,x'GX. (34) 

For any a€A we choose an auxiliary pure state channel CoGr(/9) with states Sa,x = \ita,x){'4’a,x\- Given a sequence 
a = (oi,... ,an)GA^ and a sequence x = {xi... ,Xn) S A’", let 

'^a,x '^ai ,xi ^ ^ ^an,,Xn,‘ (35) 

Now, given two sequences x = (xi,... ,Xn) and x' = {x'i,... ,x'^), we can use these auxiliary channels to bound the 
overlap as 

(36) 

This will allow us to bound E{R,P) for the original channel using the bound (see for example [3 Th. 12]) 

£'(i?,P)<--log max + o{l) (37) 

Tl m^m' 

<--l0gmax |('0a,cc„.|'0a,a;,„,)P+o(l). (38) 

Tl m^m' 

We could use the extension of the sphere packing bound considered in this section to upper bound the right hand 
side of the last equation as done in [|3 Sec. VIII] if all codewords x^ had the same conditional composition given 
the sequence a. Since the sequence a is arbitrary, we choose it so that this condition is met by at least a large 
enough subset T of codewords, and we only apply the sphere packing bound to this subset T- In order to do this, 
we adopt an idea proposed by Blahut iflTlI in a generalization of the Elias bound and already considered for a 
further generalization in im, na. 

Given a code with M = e”^" codewords of composition P„, assume that there exists a conditional composition 
Vn{a\x) ■.XA (i.e., nPn{x)Vn{a\x) is an integer) such that 

Rn>I{Pn,Vn), (39) 

where I{Pn,Vn) is the mutual information with the notation of ||71- Define then 

Pnia) = J2Pn{x)Vn{a\x) (40) 

X 

(that we will write as PnVn = Pn) and and let Vn{x\a) = Pn{x)Vn{a\x)/Pn{a), so that PnVn = Pn- Note that 

I{PnX)=I{Pu,Vr,). 

Then, (see ifTTI proof of Th. 8], or ifTSi Femma 3]) there is at least one sequence a of composition Pn such 
that there is a subset T of at least | 7 "| Vn)-o(i)) codewords with conditional composition Vn given a. 

Since we are interested in the limit as n^oo, we directly work with the asymptotic rate R, compositions P and 
P and matrix V, and we neglect the constraint that nPn{x), nPn{x)Vn{a\x) etc. are integers. 
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Now, we can use the conditional sphere packing bound introduced in this section to bound the probability of 
error of the subcode T of rate R = R — I{Pn,Vn) — o{l) used over the varying channel - ,Ca„- For these 

codewords used over this varying channel, there is a decision rule such that ( 1191 , 13] Sec. VIII]) 


^e,max ^ (|T|-1) max \{'lpa,x„\lpa,x^,)\ 
<gn(fl-/(p.t/)+o(i)) max 

m^m 


On the other hand, as n^oo. Theorem |6] with rate R gives 

--l0g^max<£;“({Ca},i?-e,V|P)+0(l) 
n ^ 

<Et;aL},R-IiP,V)-e,V\P)+oil). 
Putting together equations dSST l. (l42li and (l44li . we obtain 

E{R,P) < p[P-({€4,P- I{P, V)-e,V\P) + R- I{P, V)]. 


(41) 

(42) 

(43) 

(44) 

(45) 


Since the choice of p, of the channels {Ca}Gr(p) and of the distributions P,V can be optimized, we have, in 
analogy with O Th. 11], 

Theorem 7: For a pure-state channel, the reliability function with constant composition P satisfies E{R,P)< 
E!pui^,P) where 

E!;,{R,P) = mip[E!;{{ia},R-I{PX)-e,V\P) + R-I{P,V)], (46) 

the infimum being over e>0, p>l, auxiliary pure-state channels CaGr(p), and auxiliary distributions P and V 
such that Pv = P. 

Remark 8: Note that for the choice A = X, V(a|a;) = P(a), Va, we have I{P,V) = 0. We can also notice that 
the optimization of the channels £q will give £„ = £, Va, for an optimal £. With this constraint on V, the bound 
E{R,P) is weakened to 

infp[E:;ii,R-e,P) + R], (47) 

where the infimum is now only over p> 1 and £sr(p). This is a constant composition version of the bound in l[3| 
Th. 11]. 


C. Connection with the Elias Bound 

In the same way as ID Th. 11] generalizes the results of |[3] Sec. Ill], it possible to reinterpret the idea used to 
obtain Theorem [Tj as a generalization of the Elias bound presented in ifTTI and ifTSl . For this purpose, it is useful 
to introduce a notion of distance between symbols and distance between sequences, and then restate our bound as 
a bound on the minimum distance of codes. Finally, bounds on the reliability function can be obtained by relating 
the minimum distance to the probability of error (see ||T^ Sec. VI] for details). 
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Let c? be a function d:X x A"—^-R+Uloo} such that 

d{x,x')>0 

d{x,x')=d{x' ,x) \/x,x' 

d(x,x) =0. 


We call this function d a “distance” although, as seen above, we do not really require all the properties of a distance. 
We stress that d is allowed to take value oo for some pairs of symbols, a case which is of practical interest in our 
context. We extend the distance to sequences of symbols defining, for x = {xi,... ,Xn) and x' = {x'i,... ,x[J, 

n 

d{x,x'):='^d{xi,Xi). (48) 

i=l 

Note in particular that d{x,x')=oo iff d{xi,x'^ = oo for at least one i. 

For a given code C, we define its minimum distance as 

c^min(C):= min d{x,x'). (49) 

x,x' EC, x^x' 

For a composition P, we define 

d{R,n,P):=maxdiain{C), (50) 

where the maximum is over all codes of length n, rate at least R, and composition P. For a fixed R, we then define 

5*{R,P):= limsup —d{Rn,n,Pn), (51) 

n—l-oo,{Pn} ^ 

where i?„—and P^^P as n^oo. 

Note that we can drop the constant composition constraint defining 


d{R,n) :=maxdinin(C), 


(52) 


and, correspondingly, 

:=limsup(53) 

n—>oo ^ 

Then we have 


6*{R):=max6*{R,P). (54) 

We want to use our results to bound the quantity S*{R,P). In order to do this we proceed in a similar way as done 
in Section [V-BI Note that this corresponds to what done in ifTSl with two variations; 1) we use general auxiliary 
classical-quantum channels in place of the so called representations composed of vectors, and 2) we replace the 
Lovasz-like trick of ifTSl Lemma 2] with the sphere packing bound. 

Given the distance d and a p> 1, we define the set r(p) of admissible auxiliary channels € with states Sx such 


that 


Tr 



< g-d(2:.a;')/p_ 


(55) 
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We then consider again as in Section IV-BI the subcode T of codewords with composition Pn all with the same 
conditional composition Vn given the sequence a. For any we choose an auxiliary channel €aGr(p) with 

states Sa,x and for an xgT we define 


Sa,x — Sai ,x-i ^ 

Note that this implies that for two sequences x and x’. 


TrJ Sa,xy Sa,x'<e 


)Sa 


^-d(x,x')lp 


(56) 


(57) 


Consider now an optimal decision scheme for the states associated to the subcode T, that is, Sa,x, x gT- The 
extension of (l42l i lfT9ll says that for such a set of states, there exists a measurement such that 


P <p 

' e,max ^ 


niR-IiP,V)+oil)) Xr V \/ - 


(58) 


But, again, we can use the conditional sphere packing bound to lower bound the probability of error of the subcode 
T as 


- - log Pe,max < £1“({€„}, i? - /(P, V) - S, V\P) + o(l). 
n ^ 

Combining equations (ISTT i. (ISFt and (l59l l we obtain 

1 min d{xm,x^')<p{E!;{{ia},R-IiP,V)-e,V\P) + R-I{P,V)) + o{l), 

n m^m' ^ 

which asymptotically gives the following result. 

Theorem 9: For a distance d and assuming the above definitions, we have the inequality 


(59) 


(60) 


6*{R,P)<e:;^{R,P), (61) 

where P“y(P,P) is defined in (l46l l. 

As mentioned, this bound is an extension of iTTSl Th. 6]. To see this, we can consider the particular case in which 
we restrict the attention to pure-state auxiliary channels with states Sa,x = \Pa,x){Pa,x\ and then study the smallest 
rate for which the bound P“y(P, P) (with this additional constraint) is finite. First note that for fixed channels 
{Ca}, distributions P and V, and e sufficiently small, the quantity on the right hand side of equation (l46l) is finite 
for R>Roo{{€a},V\P) + I{P,V). Furthermore, when R approaches this value from the right, using equation (l3^ . 
the right hand side of equation (l46l l is upper bounded by 2pR^{{€a}-,V\P). So, for R>Roo{{€a},V\P) + I{P,V) 
we have the bound 

S*iR,P)<2pR^i{ia},V\P). (62) 
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For pure state auxiliary channels we can write 


i?oo({€a},F|P) = ^P(a)i?oo(Ca,n-|a)) 




a) mm 

Fa 


-Y,V{x\a)\ogTT{Sl,F,) 


= min P(a)V(x\a)\og— -=- 

<min ^ P(a)F(x|a)log 


a,xG^ 


|(^a,x|/a)P 


(63) 

(64) 

(65) 

( 66 ) 


where the last step we have enforced minimization over rank one operators Pa = |/a)(/a|- Optimizing now over p, 
P and V such that PV = P, and the auxiliary vectors {'4>a,x}, and comparing with the definition of 'd{p,V\P) used 
in IfTSl . we deduce that the bound of Theorem 0 includes, as a particular case, the bound presented in IfTSl Th. 6] 
as a generalization of the Elias bound for general, possibly infinite, distance]^. Hence, it includes in particular all 
previously known extensions as discussed in M- 
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Appendix A 
Prooe of Theorem[T] 

The structure of the proof is the same as in Q, and 13] Th. 5] with some technical changes which are required 
for dealing with general compositions. While introducing this changes, we also considerably simplify some of the 
technicalities with respect to fS] Th. 5] in order to give a simpler yet more transparent proof of both this and the 
original theorem. 

Prom the definition of E{R,P), there exists a sequence of codes of block-lengths n = l,2,... with rates )>i?, 

(n) 

compositions Pn^P and with probabilities of error Pe.max such that 

E{R, P) = limsnp--logPi%,. 

n—^oc ^ 

We first observe that we can just focus on the subset of input symbols with P{x) > 0 and assume without loss of 
generality that Pn{x)=0 if P{x) = 0. This technicality is needed after equation (jThj) below and can be motivated 
as follows. Let Xq be the subset of X such that P{x) = 0 if and only if xGXq. Then, for for any sequence of 

^Note that the definition of r(p) in (H is slightly different than here, so that the parameter p here corresponds to the parameter p/2 there. 
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compositions Pn^P, for all xGXq we can write that Pn{x)<en/\XQ\, where e„—>-0 as n^oo. Any codeword with 
composition P„ will contain symbols in Xq in at most ne„ positions. There are only nearly choices for these 

positions and, for each such choice there are only at most possible combinations of symbols in Xq. Hence, 

from a code with rate i?„ and composition we can extract a subcode with rate = P„ — P (£„) — £„ log |Ao| 
such that each symbol in Xq appears precisely in the same positions in all codewords. We can then bound E{R,P) 
by bounding the probability of error for this subcode since, given that >^0, we have (P^ —P„)—However, 
in the chosen subcode each symbol in Xq appears in the same positions in all codewords, and can thus be replaced 
with any symbol in X\Xo without affecting the probability of error. 

For every fixed n, the idea is again as in previous proofs to consider a binary hypothesis test between a properly 
selected code signal Sx^ and an auxiliary density operator p = p’^'^_ The main difference with respect to ||3] Th. 
5] is in the choice of F and, as a consequence, in some technical details. 

Let n be fixed and let M be the number of codewords, that is M = For any m = \,.. .,M consider a binary 
hypothesis test between Sx^ and an auxiliary state p = p®'^. We assume that the supports of the two operators 
are not disjoint and, with the notation used in 13, we define the quantity 


Note that, setting 


we can write 


= logTrPi-FL 
M5.,^’(s)=log(TrP^^P^), 

n 

Ais.^,F(s) = lognTrPi::,P^ 

= logJ|(Tr5i-"P")”^"^''^ 

X 

= ny^Pnix)ns^,F{s). 


Applying 13] Th. 4], we find that for each s in (0,1), either 


Tr[(l-n^)5'3.^] > - exp /r(s)-s/r'(s)-sy/2p"(s) 


or 


Tr [H^F] > - exp /r(s) + (1 - s)/r'(s) - (1 - s)\/2/r"(s) 
o L 

As in l3| Th. 5], this can be converted in a relation between Pe((i]ax and P„ in the form that either 

> I exp L(s) - s/r'(s) 


Rri <C 

n L' 


m(s) + (1 - s)m'(s) - (1 - s)\/2p"(s)-logs 


(67) 


( 68 ) 

(69) 

(70) 

(71) 

(72) 
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Note that due to dhSl l. the right hand side of (f72] l only depends on n, s, Pn, and F. Let then this quantity be 
called Rn{s,PmF), that is, 

Rn (s, Pn,F) = -^ (^fj.{s) + (1 - s)/r'(s)-(1 - s)\/2/r"(s)- logs) . (73) 

We can use this equation to write ^'{s) in terms of Rn{s,Pn,F). Using dhSl) . we can state our conditions by saying 
that either 

Rn<Rn{s,Pn,F) (74) 

or 

n pv^'' 1 — s^-^ 1 —s n\ l — sj 

• e,max x ^ ' 

At this point we introduce the variation with respect to 0. For any F, one of the two conditions above must 
be satished and, in 0, the choice of F was made which guaranteed the best bound for the optimal compositions 
Pn- Here, instead, the compositions P„ are forced to tend to a given composition P and we have to choose F 
accordingly. For a given s S (0,1), let Fg be the operator dehned by 


Ps = argmin—y^P(a;)log(TrS'^ ®P®). (76) 

F ^' 

X 

Note that this choice guarantees that for all x with P{x) > 0, Sx and P have non-disjoint supports. Since we 
assumed that P„(a:) = 0 whenever P(a:) = 0, the requirement that and F have non-disjoint support is satished 

for all sequences Xm with composition Pn, and hence p,{s) is a hnite quantity for all sG(0,1). 

We will now relate the choice of s to the rate R and then use Fg in place of F for the chosen s (it must be 

clear, however, that p,'{s) and /i"(s) are computed by holding P hxed). Note that we can write 


Rnis,Pn,Fg)=-^Pnix) (s) + (1 - (s)] +-^ (1 - s) /2 ^ P„ (xj/X' 


// 

Sx,Fs 


(s) + -log8. 
n 


(77) 


For any hxed s, the last two terms on the right hand side vanish as n^oo, and Pn in the hrst term tends to P. 
Hence, it is useful to dehne the quantity 


R* {s,P)= \ilR Rn{s,Pn,Fg) (78) 

n—voo 

= [mSx.f.(s) + (1-s)ms^_p.^(s)] (79) 

X 

and compare this quantity to the rate R which we are considering, which is the limit of the P„’s. 

We hrst observe that, for any x and P, P'Sx,f{s) is a non-positive convex function of s for all sG (0,1), which 
implies that for any F we have 


f*S*.F(s) + (l-s)/rs^^^(s)</rSx,F(l ) 

< 0 . 


Hence, both R*{s,P) and P„(s,P„,Ps) are non-negative quantities. Furthermore, it is not difficult to see that Fg is 
continuous in s in the interval 0<s<l, and so is R*{s,P). Hence, R*{s,P) is a continuous non-negative function 


September 3, 2015 


DRAFT 









16 


of s in the interval 0 < s < 1, and we can compare this function with the asymptotic rate R. We only have three 
possible situations: 

1) i?>sup^g(o_i)i?*(s,P); 

2) i?<infsg(o,i)P*(s,P); 

3) infsg(o,i)i?*(s,P)<i?<sup^g(o_i)i?*(s,P). 

Assume case 1) is verified. Fix an arbitrary sG(0,1). Since Rn^R and Rn{s, Pn, Fs) ^ R* {s, P) < R, Rn> 
Rn{s,Pn,Fs) for all n large enough. Hence, equation (l74l) is not satisfied and thus equation (fTsT l is. Since s is 
fixed and Rn{s,Pn,Fs)>0, as n goes to infinity we find 


-l0g-^<--j-- Rn{s,Pn,Fs) + 0{l) 

n 1 —' 1 — 0 

r e.max 


< — 


^^P„(x)AiSx.F,(s)+o(l) 


1 —s 


and in the limit, since P„ —P, 


P(P,P)<P§" 


1 -s’ 


P . 


(80) 

(81) 

(82) 


Since this holds for arbitrary sS(0,l), we have 


P(P,P)<limP; 


= 0 , 


1 —s 


,P 


where the last step is deduced by noticing that P) is continuous at p = 0 and that the argument of the 

minimization in the definition of P“(p,P) is a non-negative quantity which, for p = 0, vanishes for all F with full 
supporj^. This proves the theorem in case 1) since P“(P —e,P)>0. 

Assume now that case 2) is satisfied, which means by definition of R*{s,P) that, for any sG (0,1), we have 

R^-'^Pix) [f^S:,,Fsis) + il-s)p's^ p^{s)] . 

X 

Now, since convex and non-positive for all P, it is possible to observe that PSx,Fsi^)~ — 

which implies that —PSx Fsi^) — ~t^Sx,Fsi^)/^- Thus, for all sG(0,1), 


R^'^Pix) (^--pS:r,Fs(.s)^ 


< 


1 —s 




l-s’ 


,p . 


Calling now p = s/{l — s), we find that for all p>0 


R< 


E^^{p,P) 


^Note, however, that for p > 0 there is a unique optimal F, which makes Fs well defined. 


September 3, 2015 


DRAFT 











17 


Hence, for any e > 0, we find 


Kp {R-s,P)= sup {p,P)-p{R-e)) 

p>0 


> sup(pe). 

p>0 

This means that E^^{R — e,P) is unbounded for any £>0, which obviously implies that E{R,P)<E^^{R — e,P) 
for all positive e, proving the theorem in this case. 

Finally, assume that case 3) above is satisfied. Then, for any e>0 small enough, there is a s such that R*{s,P) = 
R — e. For this fixed value s, since again i?„—7>i? and Rn{s,Pn,Es)^R*{s,P) = R — e, Rn>Rn{s,Pn,Fs) for all 
n large enough. Hence, for s = s, for all n large enough equation (l74l i is not satisfied and thus (fTsT l is. This implies 
that, for all n large enough 


-log- 




— -^Pn{x)pS:,,Fs{s)--. — zRn{s,Pn,Fs) + - ( 2s\/2/r"(s) + - 
1 — 5 1 — 5 n \ 1 


l og8 \ 

-s) 


(83) 


In the limit as n^oo the last term vanishes, Rn{s,Pn,Fg)^R*{s,P)=R — e and Pn^P. We thus conclude that 

E{R,P)<-^_Y,P{x)pis^,FM-^-{R-e) 

1 — 5 


1 — 5 


= E?l 


1-s’ 


P -■ 


1 —s 


{R-e) 


< sup (P“ (p, P) - p[R - e)) 

p>0 


=e:;{r-s,p). 


This holds for all £>0 small enough and hence, since P“(P,P) is non increasing in R, it holds for all eG (0,P). 
This concludes the proof. 


Appendix B 
Proof of Theorem[6] 

The proof is obtained by introducing a variation in the proof of theorem[T]presented in Appendix lAl In particular, 
we use a different operator F which we choose so as to take into account the state dependent structure of the 
communication process. 

From the hypotheses, the communication is governed by the sequence of states a=(ai,...,a„) with composition 
P„, where P„ —?> P, and codes are considered with conditional compositions Vn given a, where 14, — V. Here 
again, as in the other proof, we can assume that 14{2;|a) = 0 if P(a) = 0 or y(a;|a) = 0. The sttucture of the proof 
remains unchanged with the only difference that, instead of building F using n identical copies of a single density 
operators P, we can use |,A| different operators Fa, aGAto build F as 

F = Pai(g)Pa2®---®Pa„- (84) 
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Then we can still use the two equations (fTTl i and (|7^ to bound the probability of error as a function of the rate, 
with the difference that the function ^(s) now reads 



(85) 


For a given a&A and fixed s, we then choose 


F'a.s = argmin-y^T/(a:|a)log(Tr5'^ 

F ^^ 


( 86 ) 


X 


again ensuring that ,f{s) is finite. The rest of the proof follows essentially identical with the obvious differences 
due to the use of quantities E'^{€,a,p,V{-\a)) in place of used before. 

Appendix C 

A Remark on Haroutunian’s Proof of the Shere Packing Bound 

As mentioned, a greedy extension of Haroutunian’s proof of the sphere packing bound to quantum channels, as 
outlined in equation (fT^ . gives a bound which is in general weak. The reason why this happens in the quantum case 
and not in the classical one can be traced back to a fundamental difference in the solution to the quantum binary 
hypothesis testing problem in those two contexts. In fact, as seen from equations (l69l l and (iTOl l. the key ingredient 
in the proof of the sphere packing bound is a binary hypothesis test to distinguish the state Sx^ from the auxiliary 
state F. Here, a fundamental difference with the classical counterpart is related to the roles of the Kullback-Leibler 
discrimination and Renyi divergence in the expression for the error exponents in binary hypothesis testing. This 
difference was already observed in ll20l Sec. 4, Remark 1] and ifTSl Sec. 4.8] and leads to the mentioned difference 
in the expressions for the sphere packing bound. We discuss it here in detail for completeness. 

In a binary hypothesis testing between two density operators A and B, based on n independent extractions, the 
error exponents of the first and second kind can be expressed parametrically as (see 113, IS) 



(87) 


( 88 ) 


where 


p(s)=logTrAi-'*H®. 


(89) 


Upon differentiation, one finds 


- log Pe|A = - logTr(Ai-*H'*) + Tr 
n ' 

-logPe|B = -logTr(Ai-*H'*) + Tr 
n ' 


r 


(log log +o(l) 


TvA^-^B^ 


(90) 




(logA^ ®-logi?^ '*) +o(l) 


( 91 ) 
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In the classical case, A and B commute. We can then define the density operator 14 = and use the 

properties logS® —logA® = logA^“®i3'* —logA and log— logi?^“® = log— logi? to obtain 

- - log Pe u = Tr K (log 14 - log A) + o (1) (92) 

n ' 

= 19(14||A) + o(l) (93) 

and 

- - log Pe IB = Tr 14 (log 14 - log A) + o( 1) (94) 

= D(14||B) + o(l) (95) 

However, if A and B do not commute, the above simplification is not possible. This discussion extends without 
fundamental differences to the binary hypothesis test between the state Sx^ and the auxiliary state F with the 
exponents expressed as in equations (l6^ and (TTOI i. If we assume that all the Sx operators and F commute, the 
exponents of the binary hypothesis test used in the sphere packing bound can be expressed in terms of Kullback- 
Leibler divergences. For a given s, instead of a single density operator 14 we will have a 14,s for each x, defined as 
14,s = 'S'^~'*f^*/Tr(5',^“®F'*). It then turns out that the optimal F to use, that is the operator Fg defined in equation 
dTfil l. is such that (see ||5] eq. (9.50)], 121] Cor. 3]) 

Fs=Y,P{x)Vx,s (96) 

X 

and this leads exactly to the usual expression of the sphere packing bound in terms of Kullback-Leibler expressions 
as in Haroutunian’s roof (see in particular ||6| eq. (19)] and ||5| eqs. (9.23), (9.24)]). In the non commutative case, 
however, this simplification is not possible and this implies that we cannot express the sphere packing bound using 
the Kullback-Leibler divergence in the standard way. 
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